library(tidyverse)
library(tidytext)
Measure an artist’s character with… -Lyrical complexity -Musical complextiy -Genre Bleed (shifting genres or integrating multiple genres) -Number of writers (perhaps number of writers who aren’t the artist) -Label
There will be more in depth analysis of archetype artists (chord data), but spotify music data is available for all artists.
IS THIS CORRECT TO SAY????? The units of the study will be artists, as they will be who are tracked over time. Aside from this though, it will be useful to track the popular music landscape as a way of explaining character change over time. This will involve tracking the most popular genres over time by seeing the genere distribution on the Billboard Hot 100, the top Spotify Streams, and the accolades given by the Grammy Awards.
This is the roadmap of how the data will be analyzed. 1. All valid artists a. Global music landscape -Distribution of genre over the Billboard Hot 100 and Spotify most streamed -Get the genre from the non-genre specific Grammy awards b. Artist character change over time -Number of songwriters an artist works with over time -Trends of song meta-data over time -Lyric complexity over time 2. Archetype artists
library(tools)
library(stringr)
billboardDf = read.csv("FrostData/billboardRankings.csv")
spotifyDf = read.csv("FrostData/spotifyStreams.csv")
riaaDf = read.csv("FrostData/RIAACertifications.csv")
grammyDf = read.csv("FrostData/grammyWinners.csv")
songSecsDf = read.csv("FrostData/songSections.csv")
songAttrsDf = read.csv("FrostData/songAttributes.csv")
colnames(billboardDf)[colnames(billboardDf) == "Weekly.rank"] = "BillboardWeekRank"
colnames(billboardDf)[colnames(billboardDf) == "Peak.position"] = "PeakPosBillboard"
colnames(billboardDf)[colnames(billboardDf) == "Weeks.on.chart"] = "WeeksOnBillboard"
colnames(billboardDf)[colnames(billboardDf) == "Date"] = "ReleaseDate"
colnames(billboardDf)[colnames(billboardDf) == "Writing.Credits"] = "WritingCredits"
colnames(spotifyDf)[colnames(spotifyDf) == "Track.Name"] = "Name"
colnames(spotifyDf)[colnames(spotifyDf) == "Position"] = "SpotifyWeekPosition"
colnames(riaaDf)[colnames(riaaDf) == "Status"] = "RiaaStatus"
colnames(riaaDf)[colnames(riaaDf) == "Title"] = "Name"
colnames(grammyDf)[colnames(grammyDf) == "Award"] = "GrammyAward"
colnames(grammyDf)[colnames(grammyDf) == "SongTitle"] = "Name"
colnames(songSecsDf)[colnames(songSecsDf) == "Song"] = "Name"
colnames(songAttrsDf)[colnames(songAttrsDf) == "Track"] = "Name"
riaaSuffixStrip = function(title){
new = gsub("\\(Feat.*", "", title)
return(new)
}
spotifySuffixStrip = function(title){#this needs fix
new = title %>%
sub(" - Remastered", "", .) %>%
#sub(" - Radio Edit", "", .)%>%
sub(" - From.*", "", .) %>%
sub(" - Official.*", "", .)
return(new)
}
stringColStandardizer = function(oldText){
newText = oldText %>%
as.character(.)%>%
tolower(.) %>%
toTitleCase(.) %>%
trimws(.)
return(newText)
}
DEAL WITH REMIX
billboardDf[c(2,3,9,10, 12)] = lapply(billboardDf[c(2,3,9,10, 12)], stringColStandardizer) #not lyrics
billboardDf$ReleaseDate = as.Date(billboardDf$ReleaseDate, format = "%B %d, %Y")
spotifyDf[c(2, 3, 4)] = lapply(spotifyDf[c(2, 3, 4)], stringColStandardizer)
spotifyDf[2] = lapply(spotifyDf[2], spotifySuffixStrip)
spotifyDf[c(2, 3, 4)] = lapply(spotifyDf[c(2, 3, 4)], stringColStandardizer)
spotifyDf[2] = lapply(spotifyDf[2], spotifySuffixStrip)
grammyDf[c(5, 6)] = lapply(grammyDf[c(5, 6)],stringColStandardizer)
songSecsDf[c(3, 4)] = lapply(songSecsDf[c(3, 4)], stringColStandardizer)
songSecsDf[c(5, 6, 7)] = lapply(songSecsDf[c(5, 6, 7)],as.character)
songAttrsDf[c(3, 4, 13)] = lapply(songAttrsDf[c(3, 4, 13)], stringColStandardizer)
riaaDf[c(2,3,5)]=lapply(riaaDf[c(2,3,5)],stringColStandardizer)
riaaDf[c(2)]=lapply(riaaDf[c(2)],riaaSuffixStrip)
As a note*** It will likely be wise to group by artist and some metric time (either just year, month, or full date) to get the picture when it comes to an artist’s character over time.
Let’s create a count of all writers on a song. **NOTE OF A PROBLEM: if we end up excluding writers who are not the artist themselves, will need to do some work to handle bands/groups
billboardDf = billboardDf %>%
mutate(numWriters = (str_count(WritingCredits, ",")+1)) %>%
mutate(numArtists =(str_count(Artists, ",")+1))
(billboardDf)
As a way of tracking an artist’s popularity, seeing how many songs an artist has on the chart in any given week can be useful.
artists = unique(unlist(strsplit(as.character(billboardDf$Artists), ", ")))
length(artists)
[1] 1076
#THIS IS THE LIST OF UNIQUE ARTISTS, MAY NOT NEED
####CHECK THIS!!
separate_rows(billboardDf, Artists, sep = ", ")
chartedSongs = billboardDf %>%
separate_rows(., Artists, sep = ", ") %>%
aggregate(Artists ~ (Artist = Artists) + Week, data = ., FUN = length)
For Spotify streams, the sum of streams grouped by artist, year, and month can serve as a measure of the artist’s change in popularity.
spotifyDfNew = separate(spotifyDf, col= Week, into= c("Year", "Month", "Day"), sep ="-")
billboardDfNew = separate(billboardDf, col= Week, into= c("Year", "Month", "Day"), sep ="-")
monthlyArtistStreams = aggregate(x = spotifyDfNew["Streams"], by = list(Artist = spotifyDfNew$Artist, Year = spotifyDfNew$Year, Month = spotifyDfNew$Month), FUN = sum)
monthlyArtistStreams =as.data.frame.matrix(monthlyArtistStreams)
spotifyDfNew[,c(8)] = sapply(spotifyDfNew[,c(8)], as.numeric)
billboardDfNew[,c(9)] = sapply(billboardDfNew[,c(9)], as.numeric)
spotifyDfNew = spotifyDfNew %>%
mutate(Week = (Day %/% 7) +1)
spotifyDfNew = spotifyDfNew[!grepl("- Remix", spotifyDfNew$Name),]
billboardDfNew = billboardDfNew %>%
mutate(Week = (Day %/% 7) +1)
billboardDfNew$PeakPosBillboard[is.na(billboardDfNew$PeakPosBillboard)] = billboardDfNew$BillboardWeekRank[is.na(billboardDfNew$PeakPosBillboard)]
billboardDfNew$WeeksOnBillboard[is.na(billboardDfNew$WeeksOnBillboard)] = 0
Here the functionality will be built to join all data on a chosen artist. For the examples to follow, the band “Maroon 5” will be used.
library(data.table)
Registered S3 method overwritten by 'data.table':
method from
print.data.table
data.table 1.12.2 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
Attaching package: ‘data.table’
The following objects are masked from ‘package:dplyr’:
between, first, last
The following object is masked from ‘package:purrr’:
transpose
library(plyr)
-----------------------------------------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
-----------------------------------------------------------------------------------------------------------------
Attaching package: ‘plyr’
The following objects are masked from ‘package:dplyr’:
arrange, count, desc, failwith, id, mutate, rename, summarise, summarize
The following object is masked from ‘package:purrr’:
compact
artistDataJoiner = function(artist){
drops = c("X", "Day", "Artists", "Artist", "Unnamed..0")
billboardSub = billboardDfNew[billboardDfNew$Artists %like% artist, ]
spotifySub = spotifyDfNew[spotifyDfNew$Artist %like% artist,]
spotifySub = spotifySub[ , !(names(spotifySub)) %in% drops]
billboardSub = billboardSub[ , !(names(billboardSub)) %in% drops]
new1 = join_all(list(billboardSub, spotifySub), by = c("Name","Features" ,"Year", "Month", "Week"), type = "full")
riaaSub = riaaDf[riaaDf$Artist %like% artist,]
grammySub = grammyDf[grammyDf$Artist %like% artist,]
songAttrsSub = songAttrsDf[songAttrsDf$Artist %like% artist,] %>%
mutate(DurationInSecs = Duration/1000) %>%
select(-c("Duration"))
songSecsSub = songSecsDf[songSecsDf$Artist %like% artist,]
riaaSub = riaaSub[ , !(names(riaaSub)) %in% drops]
grammySub =grammySub[ , !(names(grammySub)) %in% drops]
songAttrsSub = songAttrsSub[ , !(names(songAttrsSub)) %in% drops]
songSecsSub =songSecsSub[ , !(names(songSecsSub)) %in% drops]
new2 = join_all(list(new1, riaaSub, grammySub, songAttrsSub, songSecsSub), by = "Name", type = "full") #add songSecsSub later
return(new2)
}
archArtist = artistDataJoiner("Maroon 5")
validAlbums = c("Red Pill Blues (Deluxe)", "v (Deluxe)", " Overexposed Track by Track", "Hands all over (Deluxe)", "it Won't be Soon Before Long.", "Songs About Jane")
archArtist = filter(archArtist, Album %in% validAlbums)
archArtist
#Measure of Popularity
Quantifying popularity will be an done in multiple ways to account for imperfections about each metric. There will be multiple popularity metrics, and they can be compared and contrasted across songs. They are as follows: pop1 = sum(1/current * weeks) pop2 = sum(1/current) pop3 = ln(101.1- min(peak)) pop4 = mean(ln(101.1 - current))
Pop1 is a metric which rewards songs which reach their peak on the charts later in their lifetime on the charts so due to this it discrimates against tracks which peak right away and disipate quickly. Pop2 is a metric which does not have an appropriate scale, as having the 2nd spot on the Hot 100 is half as valuable as the number 1 spot. Pop3 only considers the peak position on the chart, but does scale it more appropriately than the first 2. Pop4 uses the natural log scale to more appropriately consider differences in chart position, and takes the mean of all the log chart positions to account for both longevity and position.
pop1Calc = function(df){
ret = sum(df$WeeksOnBillboard/df$BillboardWeekRank, na.rm = TRUE)
return(ret)
}
pop2Calc = function(df){
ret = sum(1/df$BillboardWeekRank, na.rm = TRUE)
return(ret)
}
pop3Calc = function(df){
ret =log(101.1 - min(df$PeakPosBillboard, na.rm = TRUE))
if (is.nan(ret)){
ret = NA
}
return(ret)
}
pop4Calc1 = function(val){
if (!is.na(val)){
ret = log(101.1- val)
}else{
ret = val
}
return(ret)
}
pop4Calc2 = function(df){
vals = sapply(df$BillboardWeekRank, pop4Calc1)
return(mean(vals, na.rm = TRUE))
}
getPopularityMetric = function(df){
new1 = df %>%
select(Name, BillboardWeekRank, WeeksOnBillboard, PeakPosBillboard) %>%
distinct() %>%
group_by(Name) %>%
do(data.frame( pop1=pop1Calc(.), pop2 = pop2Calc(.), pop3 = pop3Calc(.), pop4 = pop4Calc2(.))) %>%
distinct()
new2 = df %>%
select(Name, ReleaseDate) %>%
distinct()
new = merge(new1, new2, by="Name", type= "full")
new = new[complete.cases(new), ] %>%
distinct() %>%
arrange(desc(ReleaseDate))
return(new)
}
archArtistPop = getPopularityMetric(archArtist)
no non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs produced
archArtistPop
test = artistDataJoiner("Twenty One Pilots")
unique(test$Album)
[1] "Trench" NA "Blurryface"
[4] "Vessel (with Bonus Tracks)" "Twenty One Pilots" "Vessel"
test
#Measure Outside Influence To consider how much outside of influence was given in the creation of a song, counting the number of writers of the song who are not the artist themselves.
maroon5Members= c("Adam Levine", "Jesse Carmichael", "Mickey Madden", "James Valentine", "Matt Flynn", "PJ Morton", "Sam Farrar", "Ryan Dusick")
countNonBandWriters = function(df, bandMembers){
if (is.na(df$WritingCredits)){
nonBandWriters = NA
} else{
writers = str_split(df$WritingCredits, ", ") %>%
unlist()
nonBandWriters = sum(!writers %in% bandMembers)
}
return(nonBandWriters)
}
getOutsideInfluenceScore = function(df, bandMembers){
new1 = df %>%
select(Name, WritingCredits) %>%
distinct() %>%
group_by(Name) %>%
do(data.frame(nonBandMemberWriters = countNonBandWriters(., bandMembers))) %>%
distinct()
new2 = df %>%
select(Name, ReleaseDate) %>%
distinct()
new = merge(new1, new2, by="Name", type= "full")
new = new[complete.cases(new), ] %>%
distinct() %>%
arrange(desc(ReleaseDate))
return(new)
}
archArtistInfluence = getOutsideInfluenceScore(archArtist, maroon5Members)
the condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be used
archArtistInfluence
Some further preprocessing will be done to tidy the lyric data. Then the number of total words and unique stop words will be counted, and the total divided by the number of unique words will be used as a metric to give some measure of lyrical repetition in the song. Furthermore, the average word length in the song will be recorded, as well the number of words divdided by the number of seconds in the song to get the words per second.
library(quanteda)
noContraction = function(lyrics){
new = lyrics %>%
gsub("can't", "cannot", .) %>% #special n't
gsub("couldn't've", "could not have", .) %>%
gsub("mustn't've", "must not have", .) %>%
gsub("who'd've", "who would have", .) %>%
gsub("why'd", "why did", .) %>%
gsub("n't", " not", .) %>%
gsub("'ll", " will", .) %>%
gsub("'d", " would", .) %>%
gsub("n't", " not", .) %>%
gsub("'ve", " have", .) %>%
gsub("'re", " are", .) %>%
gsub("'cause", "because", .) %>%
gsub("there's", "there is", .) %>%
gsub("everyone's", "everyone is", .) %>%
gsub("she's", "she is", .) %>%
gsub("he's", "he is", .) %>%
gsub("it's", "it is", .) %>%
gsub("let's", "let us", .) %>%
gsub("how's", "how is", .) %>%
gsub("somebody's", "somebody is", .) %>%
gsub("someone's", "someone is", .) %>%
gsub("something's", "something is", .) %>%
gsub("that's", "that is", .) %>%
gsub("there's", "there is", .) %>%
gsub("what's", "what is", .) %>%
gsub("when's", "when is", .) %>%
gsub("where's", "where is", .) %>%
gsub("who's", "who is", .) %>%
gsub("gonna", "going to", .) %>%
gsub("gotta", "got to", .) %>%
gsub("gimme", "give me", .)%>%
gsub("tryna", "trying to", .) %>%
gsub("i'm'a", "i am about to", .)%>%
gsub("i'm", "i am", .) %>%
gsub("gimme", "give me", .) %>%
gsub("y'all", "you all", .)
return(new)
}
individualLyric = function(df){
newDf = df %>%
unnest_tokens(word, Lyrics) %>%
anti_join(stop_words) %>%
distinct()
return(newDf)
}
getLyricalComplexity = function(df){
lyricsDf = df %>%
select(Name, Lyrics) %>%
distinct() %>% #now have the lyrics of all of the songs
mutate(Lyrics = tolower(Lyrics)) %>% #Get all lyrics to be lower case
mutate(Lyrics = noContraction(Lyrics)) %>% #There are now no contractions besides possessives
#mutate(unlist(str_split(Lyrics, "\n", n = 1))) %>% ##############NEEED TO BE ABLE TO STRIP THE TITLE
mutate(Lyrics = gsub("[^a-z ]", " ", Lyrics)) #Remove what is not an english letter
individual = individualLyric(lyricsDf)
fullLyrics = individual %>%
group_by(Name) %>%
tally(name = "uniqueNonStop")
totalLyrics = lyricsDf %>%
unnest_tokens(word, Lyrics) %>%
group_by(Name) %>%
tally(name = "totalWords")
avgLenAndSyls = individual %>%
group_by(Name) %>%
distinct() %>%
mutate(wordLength = nchar(word))%>%
mutate(wordSyllables = nsyllable(word, syllable_dictionary = quanteda::data_int_syllables,
use.names = FALSE)) %>%
select(Name, wordLength, wordSyllables) %>%
ddply(.,~Name, summarize, avgWordLen = mean(wordLength, na.rm = TRUE), avgSyllables = mean(wordSyllables, na.rm = TRUE))
#Get the total words by the duration
wordsByTime = df %>%
select(Name, DurationInSecs) %>%
distinct() %>%
full_join(totalLyrics, by = "Name") %>%
mutate(wordsPerSec = totalWords/DurationInSecs) %>%
select(Name, wordsPerSec)
new3 = join_all(list(fullLyrics, totalLyrics, avgLenAndSyls, wordsByTime), by = "Name")
#new3 = join_all(list(fullLyrics, totalLyrics, avgLen), by = "Name")
new3$TotalToUniqueRatio = new3$uniqueNonStop/new3$totalWords
full = new3[complete.cases(new3), ]
print(full)
full[c(2,3,4,5,6,7)] = sapply(full[c( 2,3,4,5,6,7)],scale)
scores = full %>%
do(data.frame(Name = full$Name, lyricalComplexity = full$avgWordLen +full$avgSyllables +full$TotalToUniqueRatio+ full$wordsPerSec))
datesDf = df %>%
select(Name, ReleaseDate) %>%
distinct()
final = merge(scores, datesDf, by="Name", type= "full")
final = final[complete.cases(final), ] %>%
distinct() %>%
arrange(desc(ReleaseDate))
return(final)
}
lyricalComplexDf = getLyricalComplexity(archArtist)
Joining, by = "word"
lyricalComplexDf
#Measure of Musical Complexity Previously, the music data was held for each section of each song, but it will need to be aggregated to each song. Now for each song measures of the number of unique chords, non-diatonic chords, extended chords, number of sections, and if any section ends are different will be held.
countUniqueChords = function(df){
uniqueChords = unique(unlist(strsplit(paste(paste("-", unlist(df["Progression"]), sep = ""), collapse = ""), "-"))[-1])
numUniqueChords = length(uniqueChords[uniqueChords != "NA"])
return(numUniqueChords)
}
checkDifferentEnd = function(df){
vals = df$EndDifferent
endDif = sum(!is.na(vals) & vals != "")
return (endDif)
}
getMusicComplexity = function(df){
new1 = df %>%
select(Name, Section, Progression,EndDifferent, DurationInSecs, NumSectionChords, nonDiatonicChords, extendedChords) %>%
distinct()
new2 = new1 %>%
select(Name, nonDiatonicChords, extendedChords) %>%
group_by(Name) %>%
summarise_all(sum)
new3 = df %>%
group_by(Name) %>%
do(data.frame(numUniqueChords = countUniqueChords(.)))
new4 = new1 %>%
group_by(Name) %>%
do(data.frame(endDif = checkDifferentEnd(.)))
full = join_all(list(new2, new3, new4), by = "Name", type = "full")
full = full[complete.cases(full), ]
full[c(2,3,4,5)] = sapply(full[c( 2,3,4,5)],scale)
print(full)
scores = full %>%
do(data.frame(Name = full$Name , musicalComplexity = 2 * (full$nonDiatonicChords) + (full$extendedChords) + (full$numUniqueChords) + 5 * (full$endDif)))
datesDf = df %>%
select(Name, ReleaseDate) %>%
distinct()
final = merge(scores, datesDf, by="Name", type= "full")
final = final[complete.cases(final), ] %>%
distinct() %>%
arrange(desc(ReleaseDate))
return(final)
}
musicComplexDf = getMusicComplexity(archArtist)
musicComplexDf
Now all of the smaller metric datasets will be joined.
fullMetricsDataSet = function(popScores, origScores, lyricComp, musicComp){
full = join_all(list(as.data.frame(popScores), as.data.frame(origScores), as.data.frame(lyricComp), as.data.frame(musicComp)), by = "Name", type = "full")
full[c(2,3,4,5,7,8,9)] = sapply(full[c( 2,3,4,5,7,8,9)],scale)
return(full)
}
artistMetricDf = fullMetricsDataSet(archArtistPop, archArtistInfluence, lyricalComplexDf, musicComplexDf)
artistMetricDf
#Applicatble Functionality Now that all of the metrtic data is collected along with the original data on the artist’s songs, their tracks can be compared directly to each other in some meaningful ways.
#Can track pop1 metric over time because weeks is a changing metric
#Join all of the originality and complexity metrics because they are attatched to the song, not moving by week
compareTracks = function(songs, artistDf, metricDf){
songs = lapply(songs, stringColStandardizer)
artistDfSub = artistDf[artistDf$Name %in% songs, ] %>%
select(Name, BillboardWeekRank, WeeksOnBillboard, Year, Month, Week, ReleaseDate, Genre, Features, Songwriter, numWriters, numArtists, Streams, RiaaStatus, Label, GrammyAward, GrammyYear, Explicit, Album, Acousticness, Danceability, Energy, Instrumentalness, Liveness, Loudness, Mode, Popularity, Speechiness, Tempo, TimeSignature, Valence, DurationInSecs) %>%
distinct()
print(metricDf)
#metricDfSub = metricDf[c(-2, -3, -4)]
full = merge(artistDfSub, metricDf, by = "Name", type = "full")
full = subset(full, !is.na(full[,3]))
#Here is the weekly contribution to the pop1 score that was calculated earlier.
#Shows how the contribution to this metric changes over time of a song on the chart
full$pop1SWeekScore = full$WeeksOnBillboard / full$BillboardWeekRank
song1 = paste(songs[-length(songs)], collapse = ", ")
song2 = songs[length(songs)-1]
graphic1 = ggplot(full, aes(x=WeeksOnBillboard, y = pop1SWeekScore, color = Name)) +
geom_point() + geom_line() + labs(title = substitute(paste("Weekly Contribution to Pop1 Score On Billboard Hot 100 from " ,song1, " and ", song2)))
graphic2 = ggplot(full, aes(x=WeeksOnBillboard, y = BillboardWeekRank, color = Name)) +
geom_point() + geom_line() + labs(title = substitute(paste("Rank On Billboard Hot 100 for " ,song1 , " and ", song2, " by Week"))) + scale_y_reverse()
print(graphic1)
print(graphic2)
print(full)
#outsideInfTableDf = full %>%
# select(Name, ReleaseDate, Label, nonBandMemberWriters) %>%
# distinct()
#popTableDf = full %>%
# select(Name, ReleaseDate, pop1, pop2, pop3, pop4, GrammyAward, RiaaStatus) %>%
# distinct()
#complexityTableDf = full %>%
# select(Name, ReleaseDate, lyricalComplexity, musicalComplexity) %>%
# distinct()
}
#compareTracks("She will be loved", "Girls like you", archArtist, artistMetricDf)
compareTracks(c("She will be loved", "Harder to Breathe", "Wait", "Sugar"), archArtist, artistMetricDf)
NA
All of the previously created functionality should be able to be applied to any valid artist that there is available data on. The full pipeline of function calls is below,
#Need to pass artist, and valid albs
completeArchDf = function(artist, members, validAlbs){
artistDf = artistDataJoiner(artist) %>% filter(Album %in% validAlbs)
print(artistDf)
archArtistPop = getPopularityMetric(artistDf)
archArtistInfluence = getOutsideInfluenceScore(artistDf, members)
lyricalComplexDf = getLyricalComplexity(artistDf)
musicComplexDf = getMusicComplexity(artistDf)
artistMetricDf = fullMetricsDataSet(archArtistPop, archArtistInfluence, lyricalComplexDf, musicComplexDf)
relDateDf = artistDf %>% select(Name, ReleaseDate)
fullMetric = merge(relDateDf, artistMetricDf, by ="Name")
#Fill musical complexity NAs with 0 because it is standardized
fullMetric$musicalComplexity[is.na(fullMetric$musicalComplexity)] = 0
fullMetric$totalComplexity = fullMetric$lyricalComplexity + fullMetric$musicalComplexity
fullMetric = fullMetric[complete.cases(fullMetric), ]
graph = ggplot(fullMetric, aes(x = ReleaseDate, y = totalComplexity)) + geom_point() + geom_smooth(method = "lm", ) + labs(title = paste( artist, "'s Song Complexity over Time"), ylab = "Song Complexity", xlab = "Release Date")
print(fullMetric)
print(graph)
}
completeArchDf("Maroon 5", c("Adam Levine", " Jesse Carmichael", "Mickey Madden", "James Valentine", "Matt Flynn", " PJ Morton", "Sam Farrar", "Ryan Dusick"), c("Red Pill Blues (Deluxe)", "v (Deluxe)", " Overexposed Track by Track", "Hands all over (Deluxe)", "it Won't be Soon Before Long.", "Songs About Jane"))
no non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedno non-missing arguments to min; returning InfNaNs producedthe condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be usedJoining, by = "word"
Error in FUN(X[[i]], ...) : object 'ReleaseDate' not found
ORDER BY RELEASE DATE IN SMALLER DATA SETS FOR NARRATIVE PURPOSES